Search CORE

12 research outputs found

The Universe at Extreme Scale: Multi-Petaflop Sky Simulation on the BG/Q

Author: Daniel David
Fasel Patricia
Finkel Hal
Frontiere Nicholas
Habib Salman
Heitmann Katrin
Insley Joe
Kumaran Kalyan
Lukic Zarija
Morozov Vitali
Peterka Tom
Pope Adrian
Publication venue
Publication date: 19/11/2012
Field of study

Remarkable observational advances have established a compelling cross-validated model of the Universe. Yet, two key pillars of this model -- dark matter and dark energy -- remain mysterious. Sky surveys that map billions of galaxies to explore the `Dark Universe', demand a corresponding extreme-scale simulation capability; the HACC (Hybrid/Hardware Accelerated Cosmology Code) framework has been designed to deliver this level of performance now, and into the future. With its novel algorithmic structure, HACC allows flexible tuning across diverse architectures, including accelerated and multi-core systems. On the IBM BG/Q, HACC attains unprecedented scalable performance -- currently 13.94 PFlops at 69.2% of peak and 90% parallel efficiency on 1,572,864 cores with an equal number of MPI ranks, and a concurrency of 6.3 million. This level of performance was achieved at extreme problem sizes, including a benchmark run with more than 3.6 trillion particles, significantly larger than any cosmological simulation yet performed.Comment: 11 pages, 11 figures, final version of paper for talk presented at SC1

arXiv.org e-Print Archive

Crossref

GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks

Author: Austin Brian
Balma Jacob
Chunduri Sudheer
Groves Taylor
Kandalla Krishna
Kumaran Kalyan
Lockwood Glenn
Machinery Assoc Comp
Mendygral Peter
Parker Scott
Warren Steven
Wichmann Nathan
Wright Nicholas
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Crossref

eScholarship - University of California

Scalable Reactive Molecular Dynamics Simulations for Computational Synthesis

Author: Goddard William A., III
Insley Joseph A.
Kalia Rajiv K.
Kumaran Kalyan
Li Ying
Morozov Vitali
Nakano Aiichiro
Nomura Ken-ichi
Romero Nichols A.
Vashishta Priya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2019
Field of study

Reactive molecular dynamics (MD) simulation is a powerful research tool for describing chemical reactions. We eliminate the speed-limiting charge iteration in MD with a novel extended-Lagrangian scheme. The extended-Lagrangian reactive MD (XRMD) code drastically improves energy conservation while substantially reducing time-to-solution. Furthermore, we introduce a new polarizable charge equilibration (PQEq) model to accurately predict atomic charges and polarization. The XRMD code based on hybrid message passing+multithreading achieves a weak-scaling parallel efficiency of 0.977 on 786 432 IBM Blue Gene/Q cores for a 67.6 billion-atom system. The performance is portable to the second-generation Intel Xeon Phi, Knights Landing. Blue Gene/Q simulations for the computational synthesis of materials via novel exfoliation mechanisms for synthesizing atomically thin transition metal dichalcogenide layers will dominate nanomaterials science in this century

Improving GPU Performance Prediction with Data Transfer Modeling

Author: Jiayuan Meng
Kalyan Kumaran
Michael Boyer
Publication venue
Publication date: 18/07/2013
Field of study

Abstract—Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution time is not the only consideration. The overhead of transferring data between the CPU and the GPU is also an important factor; for some applications, this overhead may even erase the performance benefits of GPU acceleration. To address this challenge, we propose a GPU performance modeling framework that predicts both kernel execution time and data transfer time. Our extensions to an existing GPU performance model include a data usage analyzer for a sequence of GPU kernels, to determine the amount of data that needs to be transferred, and a performance model of the PCIe bus, to determine how long the data transfer will take. We have tested our framework using a set of applications running on a production machine at Argonne National Laboratory. On average, our model predicts the data transfer overhead with an error of only 8%, and the inclusion of data transfer time reduces the error in the predicted GPU speedup from 255 % to 9%. I

CiteSeerX

Crossref

Scalable Reactive Molecular Dynamics Simulations for Computational Synthesis

Author: Aiichiro Nakano
Joseph A. Insley
Kalyan Kumaran
Ken-Ichi Nomura
Nichols A. Romero
Priya Vashishta
Rajiv K. Kalia
Vitali Morozov
William A. Goddard
Ying Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives

Author: Bobyr Alexander
Brantley William
Chandrasekaran Sunita
Colgrove Mathew
Grund Alexander
Henschel Robert
Hernandez Oscar
Jacob Arpith C.
Joubert Wayne
Juckeland Guido
Kumaran Kalyan
Müller Matthias S.
Neilson Daniel
Raddatz Dave
Shelepugin Pavel
Vergara Larrea Verónica G.
Wang Bo
Whitney Brian
Wienke Sandra Juliane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Publikationsserver der RWTH Aachen University